Mitigating Unwanted Biases with Adversarial Learning

نویسندگان

  • Brian Hu Zhang
  • Blake Lemoine
  • Margaret Mitchell
چکیده

Machine learning is a tool for building models that accurately represent input training data. When undesired biases concerning demographic groups are in the training data, well-trained models will reflect those biases. We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary. The input to the network X, here text or census data, produces a prediction Y, such as an analogy completion or income bracket, while the adversary tries to model a protected variable Z, here gender or zip code. The objective is to maximize the predictors ability to predict Y while minimizing the adversary’s ability to predict Z. Applied to analogy completion, this method results in accurate predictions that exhibit less evidence of stereotyping Z. When applied to a classification task using the UCI Adult (Census) Dataset, it results in a predictive model that does not lose much accuracy while achieving very close to equality of odds (Hardt, et al., 2016). The method is flexible and applicable to multiple definitions of fairness as well as a wide range of gradient-based learning models, including both regression and classification tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Squeezing Mitigates and Detects Carlini/Wagner Adversarial Examples

Feature squeezing is a recently-introduced framework for mitigating and detecting adversarial examples. In previous work, we showed that it is effective against several earlier methods for generating adversarial examples. In this short note, we report on recent results showing that simple feature squeezing techniques also make deep learning models significantly more robust against the Carlini/W...

متن کامل

Addressing Covariate Shift in Active Learning with Adversarial Prediction

Active learning approaches used in practice are generally optimistic about their certainty with respect to data shift between labeled and unlabeled data. They assume that unknown datapoint labels follow the inductive biases of the active learner. As a result, the most useful datapoint labels— ones that refute current inductive biases—are rarely solicited. We propose an adversarial approach to a...

متن کامل

Learning Adversarially Fair and Transferable Representations

In this work, we advocate for representation learning as the key to mitigating unfair prediction outcomes downstream. We envision a scenario where learned representations may be handed off to other entities with unknown objectives. We propose and explore adversarial representation learning as a natural method of ensuring those entities will act fairly, and connect group fairness (demographic pa...

متن کامل

Mitigating Evidentiary Bias in Planning and Policy-Making; Comment on “Reflective Practice: How the World Bank Explored Its Own Biases?”

The field of cognitive psychology has increasingly provided scientific insights to explore how humans are subject to unconscious sources of evidentiary bias, leading to errors that can affect judgement and decision-making. Increasingly these insights are being applied outside the realm of individual decision-making to the collective arena of policy-making as well. A recent editorial in this jou...

متن کامل

Summoning Demons: The Pursuit of Exploitable Bugs in Machine Learning

Governments and businesses increasingly rely on data analytics and machine learning (ML) for improving their competitive edge in areas such as consumer satisfaction, threat intelligence, decision making, and product efficiency. However, by cleverly corrupting a subset of data used as input to a target’s ML algorithms, an adversary can perturb outcomes and compromise the effectiveness of ML tech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1801.07593  شماره 

صفحات  -

تاریخ انتشار 2018